Add in-process fine-tuning proof of concept (LlamaTrainer)#287
Open
vaiju1981 wants to merge 1 commit into
Open
Add in-process fine-tuning proof of concept (LlamaTrainer)#287vaiju1981 wants to merge 1 commit into
vaiju1981 wants to merge 1 commit into
Conversation
Wire llama.cpp's ggml-opt training path into the JNI layer, mirroring upstream
examples/training/finetune.cpp: load a model, tokenize a text corpus into a
ggml-opt dataset, run llama_opt_init + llama_opt_epoch for N epochs, and write
the fine-tuned GGUF via llama_model_save_to_file.
- train_engine.{h,cpp} - self-contained native finetune(), independent of the
inference server_context (loads its own model + context; forces no-mmap and an
f32 KV cache, as training requires)
- LlamaTrainer - minimal Java entry point (static finetune(...) overloads)
- CMakeLists.txt - compile train_engine.cpp into libjllama
The ggml-opt / llama_opt symbols already link into the static libjllama with no
build-system change (verified with nm), so this is pure JNI + C++ wiring. The
finetuneNative symbol is exported, the library links and loads cleanly, and the
Java layer compiles through the strict Error Prone / NullAway pipeline.
Scope is deliberately a proof of concept: full-model fine-tuning is compute- and
memory-intensive and upstream training support is experimental. The actual
training run is exercised by a model-gated integration test that self-skips
unless -Dnet.ladenthin.llama.train.model is set. A richer FineTuner API (dataset
handling, optimizer / LoRA options, progress callbacks) can build on this base.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A proof of concept for in-process fine-tuning, wiring llama.cpp's ggml-opt training path into
the JNI layer. It mirrors upstream
examples/training/finetune.cpp: load a model, tokenize a textcorpus into a ggml-opt dataset, run
llama_opt_init+llama_opt_epochfor N epochs, and write thefine-tuned GGUF via
llama_model_save_to_file.New API:
Why this is a small change
The open question for "can java-llama.cpp train like llama.cpp?" was whether the
ggml-opt /
llama_optmachinery even links into our staticlibjllama. It does — verified withnmon the built library:llama_opt_init,llama_opt_epoch,ggml_opt_fit,ggml_opt_dataset_init,common_opt_dataset_init,common_opt_lr_pars, andllama_model_save_to_fileare all defined (T) symbols, and CMake already linksllama-common. Sothis is pure JNI + C++ wiring with no build-system change.
What's verified
train_engine.cppcompiles and links intolibjllama(b9842).finetuneNativeJNI symbol is exported; the library loads cleanly (NativeLibraryLoadSmokeTest).LlamaTrainercompiles through the strict Error Prone / NullAway pipeline.LlamaTrainerIntegrationTest, which self-skips unless-Dnet.ladenthin.llama.train.model=/path/to/small.ggufis set (full-model fine-tuning iscompute/memory-heavy and should not run in a default build).
Design
train_engine.{h,cpp}— a self-contained nativefinetune(), independent of the inferenceserver_context; it loads its own model + context and forces the two settings upstream trainingrequires (no mmap → writable weights; f32 KV cache →
OUT_PRODhas no f16 support). Itintentionally does not call
llama_backend_free(), since other live contexts in the JVM maystill depend on the initialized backend.
LlamaTrainer— a deliberately minimal Java surface so the native path can be exercised before aricher API is designed.
Scope / next steps
This is a POC, not a finished feature — upstream training support is itself experimental (full or
selective fine-tune, small models). A follow-up
FineTunerAPI could add: dataset/file input andbatching, optimizer and LoRA-target selection, a learning-rate schedule, validation split, and
progress callbacks. Opening this to confirm the approach and the in-process feasibility.